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0 Prefix aearch tree with partial key brBnchlna 

® A prefix index tree structure for tocating data records stored through keys related to infonnation stored In 
data records. Each node includes a prefix field for a prefix string of length p of the longert string of key 
characters shared by all subtrees of the nod© and a data record Held for a reference to a dat& record whose key 
is completed by the prefix string. A node may Include ona or more branch fields when the prefix string i$ a 
prefix of keys stored in at least one subtree of the node, with a branch field for each distinct p + 1« key character 
in the keys, wherein each p+l'* key character is a branch character. Each branch field Includea a branch 
character end a branch pointer field for a reference to a node containing at least one key whose p + r character 
Is the branch character. Each node further Includes a field tor storing ths number of l«ey characters In the prefix 
string and a field for storing the number of branch fields in the node. Also disclosed are methods for 
constructing and searching a prefix index tree of the present Invention, and for Inserting nodes into the tree and 
deleting nodes from the tree* 
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Background of the Invention 



Field of Use 

10 

The present invention raletas generally to the indexing, or location, o1 information in a database through 
the use of toys and, in particular, to a prefix search tree for Indexing a database. 



1G Prior Art 

A recurring problem in databases, in particular those Implemented In computer systems, is the search 
for and location of specific items of Information stored in the database. Such searches are generally 
accomplished by constructing a directory, or index, to the database, and using search keys to search 

80 through the index to find pointers to the most liksly locations of the Information (n the database. 

In Its most usual forme, the index to a database is structured as a tree comprised of one or more nodes 
connected by branches* Each node generally Includes one or more branch fields containing information for 
directing a search, wherein each such branch field usually contains a pointer, or branch, to another node, 
and an associated branch key indicating ranges or types of Information may be located along that branch 

2S from that node. The tree, and any search of the tree, begins at a single node referred to as the root node 
and progresses downwards through the various branch nodes until the nodes containing either the Items of 
Information or. more usually, pointers to the Items of Information are reached. The Information related nodes 
are often referred to as leaf nodes, on because this Is the level at which the search either succeeds of fails, 
failure nodes. It should be noted that any node within a tree is a root node with respect to all nodes 

ao dependent from that node, and euch sub-structures within a tree are often referred to as sul>irees wHh 
respect to that node* 

The decisions as to what directions, or brandies, to take through a tree in a search is determined, at 
each node encountered In the search, by comparing the search key of keys and the branch keys stored In 
the node. TTre results of the comparisons determino which of the branches depending from a given node 

35 are to be followed In the next step of the search. In this regard, search keys are most generally comprised 
of strings of characters or numbers which relate to the item or Items of infonmation to be searched for. For 
example, "search", "tree", "trees" and "search tree" could be keys to search a database Index for 
Information relating generally to search trees while "617" and *'B95" could be keys to find all telephone 
numbers In \ho 893 exchange of the 617 area. The forms taken by the branch keys depend upon the type 

40 of search tree» as described briefly below. 

The prior art contains a variety of search tree sinjctures, among which is the apparent ancestor from 
which all later tree structures have been deveiopEd, and the most general form of search tree, the "B-tree". 
A B-tree is a multi-way search tree wherein each node Is of the form (AoKo)"'(A|K()'"(AnKn] and wherein each 
A, Is a pointer to a subtree of that node and each Ki is a key value assodated with that subtree. All key 

4S values in the subh-ee pointed to by Aj are less than the key value of K^t^, all key values In subtree An are 
greater than K„, and each subtree At may also be a multi-way search tree. The decision as to which branch 
to take at a given node is performed by comparing the search key 1^ to the branch keys K| of the node and 
fbllowing the pointer A| associated with the lowest value key Ki which Is Isi'ger than K^- the search will follow 
pointer Ao if Kx is less than all keys Kf and will follow pointer if Kk 1$ greater than key Kn* 

so The next variant on the basic B-tree is the Binary Tree wherein each node Is of the general form (Ai, 
K|,A{«i). Each node of a Binary tree therefore cx^ntains only one branch key and two branches, so that there 
are only two ("binary") branches from any node. The leftmost branch A| is taksn if search key K, is less | 
than node key K| and the rightmost branch A)tt Is taken If search key Is greater than K|. 

The B'-tree and the B'-tree are similar to the entree oxcept that in the B^ree ail information or polntars i 
to information may be located only In the leaf nodes, that Is, the lowest nodes of the tree, while In the B*- ! 
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tree all failure nodes, that la. all leaf nodes, are at the same lovel In th« tree. The B'-tre© also has specific 
requirements on the maximum and minimum number of branches depending from the root and branch 
nodes 

The B« Troe la again similar to the B--tree In Its root and branch nodes, but <flffer3 in te l©3f nodes In 
that the Bit Tree does not star© keys In lha leaf nodes, Instead, esch pointer in a leaf node has aesociated 
with It a -distinction bit" which indicates the tirst bit in which the Key tor &iat branch differs from the branch 
key contained in the root, or next higher, node to that leaf node. Distinction bits are generated by 
comparing the binary expression for the branch key tor a pointer In a leaf nod« with the binary expression 
for the node key of its root node and noting the binary number of the lowest order bit in which the two keys 
differ That number, which Is actually the number of the distinction or difference bit. is then stored in the 
leaf node In association with the pointer. A search ia conducted, at the leaf node level, by comparing the 
search key with the node key of the leaf's parent node and determining the lowest order bit in which the 
search key differs from the node key: the search then takes the leafs pointer which is associated wilh the 
next lower order distinction bit. . , i^u 

The Trie Is an index tree using variable length key values and wherein the branching at any level of the 
Trie Is determined by only a part of the key, rather than by the whole key. Also. In a Trie the branching at 
any level is determinod by the corresponding sequential character of the key. that is, the branching at the 
level of the trie is determined by the j**' character of the key. Searching a Trie for a key value Kn requires 
breaking K,v into Ite component characters and following the branching values determined by those 
20 component characters. If. lor example, the K, = LINK, then the branching at the first level is dotenmined by 
the branch corresponding to component U at the second level by component I. at the third level by N, end 
at the fourth level by K. This requires that, at the first level, all possible characters of the search keys be 
partitioned into individual, disjoint classes, that there be a first level branch for each class, and that the Trie 
contain a number of levels corresponding to the number of characters in the longest expected search key. 
25 Finally, in a Prefix B^ee each node is again of the form (AoKo)-(A|K|)"'(ArtKp) and Is searched In the 
same manner as a B^tree. but each kay K| in a Prefix B*tree is not a full key but is a "separator", or prefix 
to a full key. The key$ of each node In any subtree of a Prefix B-tree all have a common prefix, which Is 
stored in the root node of ih© subtree, and each key Ki of a node is the common prefix of all nodes in the 
subtree depending from the corresponding branch of the node. Again, there is a binary variant Of the Prefix 
B-Tree. referred to as a Prefix Binary Tree, in which each node contains only one branch key and two 
branches, so that there are only two ("binary") branches from any node. The Prefix Binary Tree Is searched 
in the same manner as a Binary Tree, that is. branching left or right depending on whether the search key 
Is less than or greater than the node key. There are also, In turn, Bit Tree variants of the Prefix Binary Trae 
wherein distinction hits rather than prefixes are stored in the nodes. In particular, the values stored are the 
numbers of the bits in the keys which are different between two prefixes, thus indicatif^g the key bits to be 
tested to determine whether to take the right or left branches. 

The above described search trees of the prior art are generally intended to provide certain optimum 
• characteristics for the most general cases of information searches and the most general types or classes of 
Information. Certain trees may be designed, for example, to provWe the minimum depth of Uee so as to 
40 reduce the number of disk accesses required to bring successive nodes or groups of nodes into system 
memory, or to provide the minimum search time, or to equalize the search times for all searches, or to 
allow the easy insertion or deletton of nodes. The tree structures of the prior art do not, however, provide 
optimum Structures for certain broad classes of Information, For example, the prior art tree stmctures are 
generally not optimum in cases wherein the keys may be divided Into rather large partitions, as is the case 
45 with carlain types of information, and do not provide the t^mum structures for creating and modifying 
search trees for such types of keys and Information. 

Vet another disadvantage of the tree structures of the prior art is that It Is generally necessary to search 
completely to the data record level to determine whether or not a particular data item is present in the 
database. This is often described as a requirement that all failure nodes be at the same level In the tree. 
SQ This disadvantage arises from the inherent search methodology as determined by the structure of the trees. 
As described, tho search key is compared to tho node keys to determine the branch paths having the rang© 
of key values most likely to contain a match with the 8ean:h key. Because the search Is based upoh 
idemifying tho branches having ranges of key values, there Is no point In the search short of the actual data 
records that a determination can be made as to whether a seansh key can actually he matched to a data 
sc record. 

A solution to the above described problems of the prior art. and other problems, are provided by a 
prefix index tree of the present Invention which 1$ particularly adapted to those classes of infbrmatlon 
wherein the keys may be divided Into rather large partitions. The tree structure of the present invention 
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further provl^les an improved structure for creating and modifying search trees for such types of keys and 
information. The tree structure of the present Inventiori further does not require that all searches continue to 
the data record level before It can be determined that a particular data Kem is not present in the database. 

5 

Summary of the invention 



The tree structure of tl« present Invention provides a prefix index tree structure for locating data 
70 records stored in a database in a data processing system through keys retated to the infonrtation stored in 
the data records. Each node of the tree Includes a prefix field for storing a prefix string of length p 
comprised of tf^ longest string of icoy characters shared by all subtrees of the node and a data record field 
lor storing a reference to a data r0cord whose key is completed by the prefix string. The tree structure 
further Includes one or more branch fields when the prefix string is a prefix of keys stored In at least one 
15 subtree of the node. There is a branch Peld for each distinct p+l** key character in the keys of the 
subtrees, wherein each distinct p+1^ key character Is a branch character. Each branch field Includes a 
branch character field for storing the p + l** character of a key and a branch pointer field for storing a 
reference to a node of a subtree containing at least one key whose p + character is the branch character. 
In further embodlmants of the present invention, each node further includes * field for storing a number 
20 equal to the number of key characters in the prefix string, and a field for storing a number equal to the 
number of branch fleUs In the node. 

The present Invention further includes methods for constructing and searching a prefix index tree of the 
present Invention, and for Inserting nodes Into the tree and deleting nodes from the tree. 

Brief Description of the Drawings 



The foregoing and other objects, features and advantages of the present invention will be apparent from 
90 the following description of the invention and embodiments thereof* as Illustrated In the accompanying 
ffgureSi wher^n: 

Fig. 1 1s a diagrammatic representation of a data processing system and an index tree resident therein; 
Rg. 2 Is a diagrammatfc representation of a nodd of a trd& of the present invention; 
Rg. 3 is a diagrammatic illustration of a tree of the present Invention; 
3S Figs. 4A. 4B and 4C are Illustrations of the Insertion of nodes Into a tree; and, 
Fig. 5 fs an niustratlon of the deletion of nodes from a tree. 



Description Of The Preferred Embodiments 

40 



A. General Description of a Tree in a Data Processing System (Fig. 1) 

4S Referring to Pig. 1, therein is an Illustrative representation of a Data Processing System 10 and an index 
Tree 12, wfth Tree 12 arranged to Illustrate the residence of Tree 12 in the addressable memory space of 
System 10. System 10 Is comprised of a Central Processing Unit (CPU) 14, which Is In turn comprised of 
an Arithmetfc and Logic Unit (ALU) 16 with associated Working Registers 18, a directly addressable 
Memory 20, which may also include a cache memory, and associated storage in the form of a Disk 22. 

60 Tree 12 is represented as having a single Root Node 24 and a plurality of Branch Nodes (Node) 26 and 
Leaf Nodes (Leaf) 28, ail connected through PointerSt or branches, 30, As indicated, the Branch Node? a& 
are further designated according to their levels In Tree M» that Is, according to their depth in Tree 12 arid, 
con^espondingly, the number of nodes that must be traversed to reach a given node. In this illustration of a 
tree, there are two Level 1 Branch Nodes, each designated as a^LI Node 26, several level 2 and Level! 3 

ss Branch Nodes, each respectively designated as a L2 Node 26 or a L3 Node 26, and a single Level' 4 
Branch Node, designated as L4 Node 26. 

TVee 12 is positioned relative to System 10 tn Rg. 1 to illustrate the iocationd of the varioue elements of 
Tree 12 in System 10's address spacei and arrows extend rightwards from System 10 to indicate the 
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boundaries of the various regions of System iQ's address space. For examplo, at the start of a $oarch, as 
illustrated iin F(g. 1, Root Node 24 would most probably be located in Working Registers 18 of System I0*s 
CPU U, and thus would be directly accessible to ALU 16, as would one or more of the Li Nodes 26 and 
possibly on© or more of L2 Nodes 26. Further of Nodes 26 and perhaps certain of Uafa 23 would be found 

a In Memory 20. while the deeper nodes of Tree 12 would be found stored as tiles In Disk 22, 

The localtons of the various Tree 12 nodes in System 10's address space effects the specific forms 
taken by the nodes and by the Pointers 30 stored therein. For examplei and as will be described in the 
following detailed description of a Tree 12 according to the present invention, each node Is ahfvays of the 
same basic form, that is. a set of fields containing specific types of Information in a specific format. Nodes 

10 residing in Working Registers i8. however, are located In specific registers while nodes located In Memory 
20 reside in physical memory locations which may be dynamically reassigned and which are located 
through logical addrtssfis. Nodes residing In Disk 22 wit! reside in disk files. Correspondingly, the Pointers 
30 to nodes residing tn Wor1<lng Registers ,18 may take the form of logical address poimers. or more likely, 
specific ALU 16 register Identifications* Pointers 30 to nodes located In Memory 20 will lake the form of 

IS logical address pointers which are translated, by System 10. to Memory 20 physical addresses whon their 
corresponding nodes are to be accessed. Pointers 30 to nodes residing In Disk 2Z wlif be in the form of ills 
references. It should be noted, however, that while the specific forma of the Information contained In the 
fields of a node may change with the location of the node In System lO's address space, the functional and 
structural and logical relationships of the various elements of the nodes of Tree 12 remain the same, 

20 The locations of the nodes in System lO's address space also affect the speed with which System 10 
may access the nodes and process the information contained therein, and correspondingly the speed with 
which System may perform a search. For example. *e nodes residing In Working Registers 18 are directly 
accessible to ALU 16 and may be processed In correspondingly little time. The rjodes residing in Memory 
20 and in any associated cache memory are also relatively quickly accessible to CPU 14. requiring only the 

25 daisy of a logical to physical address translation and a memory access cycle to be read into Working 
Registers 18 as the search progresses. The access time to the nodes of Tree 12 become greater, however, 
the deeper into Tree 12 the search progresses. In particular, the nodes residing in Disk 22 require a disk 
access operation and a file read to be transferred into Memory 20. and a subsequent transfer into Working 
Registers 18. It is therefoi^ advantageous that Tree 12 be as "flat" as possible, that Is. contain as high a 

w degree of branching as possible, to move the nodes up towards the root node lo decrease the node access 
time, and, in particular, reduca tiie number of disk accesses required to search Tree 12. It is also 
advantageous to move the Leaf Nodes 28 up into Tree 12's structure as far as possible, rather than 
requiring all Leaf Nodes 12 to reside at the same, and lowest, level of Tree 12. As will be dascribad next 
below, the Tree 12 of the present invention provides an approach lo providing these advantages for certain 

36 broad classes of infbrmation. 



B. Description of e Tree of the Present invention (Rgs> 2 and 3) 

40 A Tree 12 of the present invention is designed for use wherein the keys may be placed into suitably 
large partitions determined by leading characters shared With other keys. Tree 12 Is a dense index structure 
using variable length, character oriented ksys, Branching at any level is determined by a part of the key. 
rather than by the whole key, and the structure of the Tree 12 is independent of the order in which the Tree 
12 is constructed. 

45 A Tree 12 of the present Invention is a prefix search tree thai is either empty or is of height greater than 
or equal to one. that is, contains one or mora lavels, and satisfies the following properties: 
(i) Any node, T, of the tree is of the form and type 
p,s,(Pi....Pp),D,((B„SO...(a.,S,)) 

where the ^ 0<l< =5, represent the prefix string, the tuples [Bfiji, 0<j<*s, are branch characters and 
50 subtrees of T, respectively, and D Is a pointer to a data record; 

(II) The prefix (P|....Pp) contains the longest string of leading characters shared by every key contained in 
T (and the subtrees dependent from T): 

(iii) D is a pointer to the record with the key of length p. or Is a null if there is no such key; 
(Iv) Each Bi. a<i< =8, Is a distinct character which is the p + 1°* character of some key In T, that is. of a 
55 subtree dependent from T. whose length Is greater than p; 
(V) B|<BjM, 0<i<s; 

(vi) Each S| Is a pointer to a prefix search tree dependent from T: and, 

(vii) The keys in a subtree referenced by a S;. 0<i< - s. are formed from the set of keys in T having B, as 
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their p + 1*' character, by removing their initial p + 1 characters. | 
Referring to Rg. 2. therein Is represented a diagrammatic Illustration of the structure and format of a 
single node (T) 32 of a Tree 12 of the present Invention according to the definition presented above. Aa 
shown, T 32 may contain a Prefix Held (PF) 34 which contains a prefix of length p (Pt„.Pp) comprised of 
5 the longest string of characters shared by all keys of every subtree dependent from node T 32< and a Data 
Pointer Reld (D) 36 which contains a Pointer 30 to a data record having the l<ey (P|.»Pp), H there is such a 
key end data record. T 32 may also contain one or more Branch Relds (BFs) 3B, each of which Is 
comprised of a Branch Character Field (BC) 40 for storing a branch character Bj and a Branch Pointer Reld 
(BP) 42 for Storing a corresponding branch pointer Sj. As described, each Bj is the p + 1*^ character of a key 
10 of length greater than p of a subtree dependent from T 32 while each associated S| is a pointer to the node 
T 32 of that subtree. Finally, each node T 32 Will include a p Field 44 and an s Field 46 containing, 
respectively, the length, or number of characters. In the prefix stored in PF 34 and the number of subtrees 
(or data records) dependent from the node T 32, that is, the number of BF SB's contained In the node T 32. 
Although p Fields 44 and a Fields 46 are not a necessary part of the structure of ncdes T 32, these fields 
75 are provided to assist System 10 in processing the nodes. That is, it is more efficient to inform the 
processor as to the length of the prefixes contained in the PF 368 and the number of Branch Fields 38 than 
to have the system extract this infbnmatlon froro the PF 3Bs and BF SBs, 

As will be described below with reference to Fig. 3, certain nodes of a Tree 12 of the present Invention 
may be "leaf" nodes* which are identical in structure to the branch nodes T 32 except that they contain no 
so Branch Fields 38 as the branches are nulls. 

Referring to Rg. 3, therein Is a diagrammatic illustration of a Tree 12 of the present invention using the 
key values "Btree^^ "Binary", ''BinarySearch", "BinaryTree", '•HashTabte", "HashFunctlon", and 
"Ha3hedRle^ 

It is apparent from an examination of the keys used for this example that the Tree 12 gf Rg. 3 will have 

ss two branches, or subtrees* dependent from the root node. One branch will contain nodes for the keys 
having the initial character "B" (Btree, Binary, BinarySearch, and Binary Tree) and other for the nodes for 
the keys having the initial character "H" (HashTablo, HashFunction and HashsdFlle). Accordingly. PF 34 of 
root node T 32A will be null as there Is no common prefix shared between the keys starting with "B" and 
the keys starting with "H^ and T 32A's D field 36 will also be a null as there are no data records dependent 

00 from T a2A. T 32A will contain a first BF 38 field for the T 32A subtree containing all keys having an initial 
character "B" and a second BF 36 Held for those keys having Initial character "H", Considering the first BP 
38 field, the BF 40 Reld Bj character in this field will be the character "B" as "B" Is the p +1** character of 
the keys of the corresponding subtree of T 32A and the BP 42 fisid will contain an Sj pointer 8b to the first 
node in this subtree, T 32B. The second BF 38 field of T 32A wlH contain the character "H" as Its Bj In the 

35 BC 40 field as this Is the p + 1'* character of the keys of the corresponding subtree, and the Sj pointer in the 
BP 42 field will be a pointer Sh to the first node in this subtreei T 32& The p field 44 and s field 46 of T32A 
will respectively contain ^ 0 to indicate that the PF 34 field of T 32A contains no prefix characters, that is, Is 
a nulip and a 2 to Indicate that T 32A has two "children", that is, that there are two branches from T 32A. 
Considering T 32B, the next branch In the keys having initial character "B" will OCCUr between the key 

40 "Btree", having "t" as the Its second character, and the keys having "I" as their second character (Binary, 
BinarySearch and BlnaryTree)* There are no common prefix characters shared between the keys branching 
from this node, so that T 32B*s PF 34 field will contain a null, as will T 32B'a D field 36. The T 32B will 
again have two BF 385, with the first having a B] of "i" and the second having a Bj of "t", "I" and "t" being 
the p•^^^ characters of the keys of the subtrees dependent firom these branches. The corresponding Sj 

45 pointers will be pointers Si and S, to, respectively, nodes T 32D and T 32E. The p Field 44 and s Reld 46 
oF T 32B will respectively contain a 0. Indicating tlist the PF 34 field contains no prefix characterSi and a 2, 
indicating that T 32B has two children, or branches 

Next considering T 32E. this node contains a reference to a data record, but no further branches to 
further nodes. As such, the PF 38 fields of T sSE contain nulls, that is, the node contains no PF 38 fields, 

so The PF 34 field of T 32E contains the final portion of the key for the associated data record, the character 
string "reg" in the case of T 326. and a D field 36 containing a pointer to the data record The p Field 44 
and s Reld 45 respectively contain a 3, indicating that the PF 34 field cont^ns three characters, and a 0, 
indicating that Leaf 48A has no- branches to subtrees. 

Next considering T 320, the other node dependent from node T 32B, the subtree of which t 32D is the 

ss root node contains the keys "Binary"*, "BinarySearch" and "BinsryTree", wherein the prefixes "B" end "I" 
of these keys are stored as prefixes in the PF 34 fields of, respectively, T 32A and T 32B. The longest 
prefix common to the remaining portions of these keys, that is, to ^^nary"*, "narySearch'* and "narylree** Is 
the character string "nary". As such, the character string "nary" is stored as a prefix In the PF 34 field of T 
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32D 

Of the thra© koys in this subtree. &II three keys differ In the next cheractar following "nary" and T 32D 
could thus have three branches, "nary" Is, however, the final portion ot the key "Binary", so that, rather 
then fBsiilting in a branch to another node, the key "Binary" results in a pointer to the data record 
aseocialed with the key ''Binary' being written Into the D field 36 of T 32D. , „ . . . 

The keys "BlnarySearch" and -^BlnaryTree", however, have remaining character strings following nary 
and thus result in branches from T32D, The p + i"^ character of "BInarySearch" is "S". so that "8- appears 
as the B, of a firet BF 38, together with an Sj pointer 8s to the associated node T SEP in the BP Field 42. 
The p + l ** character of "BinaryTree" Is "T". so that "T" appears as the S| of the second BF 38, together 
with an Si pointer St to the associated nodo T 32Q in the BP Field 42. The p Field 44 and s Field 46 of T 
32D respectively contain a 4, to indicate that the PF 34 field contains a string of 4 characters, and a 2. to 
indicate that there are two branches from T32D. . . ^ ^ ^ ^ 

T aSF and T 32Q are both similar to T 32E in that these nodes contain no further branches to other 
nodes, and thus have null, gr empty, BF 38 fields, but pointers to associated data records in their respective 
D 36 fields. The PF 34 field of T 32F contains the character string "earch". which is the final portton of the 
key '•BinarySearch^ while the PF 34 field of T 32Q contains the character string "ree". which Is the final 
portion of the key •'BinaryTree^ The p Reld 44 of T 32F contains a 5, for the five characters in "earch" and 
the p Reld 44 of T 32G contains a 3. for the three characters in "ree", while the e Field 46 of each node 
contains a zero, Indicating that there are no branches from either node. 

Referring briefly to the right hand subtree of Tree 12. comprised of nodes T 32C, T 32H. T 321 and T 
32J this subtree i* constructed by the same principle as just described above. The keys contained in this 
subtree are "HashTable". "HashFunctlon" and "HashedRle" and the character "H" of ali three keys 
appears as the Bj of the corresponding PF 38 of T 32A as the p+ 1«« character of the prefix appearing in PF 
34 of T 32A. As previously described, PF 34 of T 32A contains a null character string as there Is no 
28 common prefix character string between the two branches dependent from T 32A. ^ . , ui - 

The longest prefix string common to the remaining portions of these keys, that is, to ^hTabie , 
"ashFuncaon" and "ashedFUe" Is the siring "ash" and "ash' accordingly appears in the PF 34 field of T 
32C. Because there are three keys having a the common prefix string "ash*, there will be three branches 
from T 32C* The p + 1^ characters of the remaining portions of these three keys are» after removing "ash". 
30 respectively. "T", "F" and "e^ "T". "F" and "e" accordingly appear as the B^s In the BF 3as of T 32C. 
together wHh corresponding S| pointers Sf, St and to nodes T 32H, T 32Q and T a2H. The p Field 44 
and a Field 46 of T 32C respectively contain a 3, to indicate a character string of three characters In PF 34. 
and a 3, to Indicate that there are three branches from T 32 C. 

Nodes T 32H. T 32Q and T 321 are again "leal" nodes in that they contain pointers to data records In 
3S their 0 fislds 36. but no further branches and correspondingly no BF 38s. The PF 34 field of T 32Q contains 
the string "unction", which is the remaining portion of key "HaahFunction". while the PF 34 fields of T 32Q 
and T 3aH respectively contain "able" and **dRlB". the final portions of keys '^HashTable" and 
"HashedRle". The a Raids 46 of each of these nodes contain Os. as there are no branches from these 
nodes. The p Fields 44 of these nodes respectively contain a 7. a 4 and e S. representing the number of 
40 Characters In the remaining portions of the keys stored fn their PF 34 fields. 
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C. Searching of a Tree 12 

46 m order to search for any given key value in the Tree 12 of the present Invention, System 10 begins at 
the root node and proceeds through the Tree 12, node by node, as described In the following, until the 
search reaches a failure node, that is. a node which has no match for the search key, or succeeds by 
finding the data record con'esponding to the search key. 

Starting in the root node, the system compares the search key (K), which has a length, or number of 
50 characters, k, to the pre«x character string (P). which has a length p. stored in the PF 34 of the node to 
detennine whether the prefix matches at least the initial characters of the search key. That is, to dotcrmlne 
whether K> « P and K, = P| for some l< = p. In this regard, it should be noted that If the prefix P - 0. that (s. if 
P is a null string, then 2ero characters of the search key and prefix are considered matched. 

If there is a complete match between search key K and prefbc P. that is, P =K. then the corresponding 
as data record is pointed to by the pointer stored in the D field 36 of the nods. 

In there Is a match between the preRx character string, which has a length p. and the first p characters 
of the search key character siring, then the system searches the B^s of thq BC 40 fields of the BF 38*8 to 
find a Bj which matches the character of the key K (Kp*i). Iff tha search finds no Bj=KpM. then the 
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key value is not contained in the node and the search has failed. I 
>f the search find$ a Bj-Kp+i^ then the search follows the associated S) pointer to the corresponding j 
next node and continues the eearch. It will be remembered, however, that the prefix for each succeeding 
node In the trea Is comprised of the longest prefi?c string common to the remaining portions of the keys 

5 aftsr removai of the leading prefi>c charactera which have been Incorporated into the prefixes of previous 
nodes. In a lilce manner, the i<ey used to search a next node of the tree has a new l<ey value of Kpta..K|e, 
that ts, is comprised of the portion of the search key remaining after removal of the leading Icey characters 
which have been matched to prefixes and branch characters In previous nodes. 

Further description of the searching of d tree of the present Invention may be found in the following 

10 exemplary Search Program Listing A: 
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?ff?'rP^1 WfTT^'g ^ - TREE SEARCH 

procedure PSEARCH (T,(Ki..Kfc)) 

// search the prefix search tree f residing on 
disk for the key value (K^.-Kj^). A tuple 
(i,d) is returned; i is false if K does not 
exist. otherwise i is true and d l9 the data 
record pointer // 
if (T^O) then return (FALSE, 0) // special easei tree 

is empty // 
X«"T; n=0 
loop 

20 input node X from disK 

let X define p,s, (P^. .Pp) |D, { (B^^^Sj^) 
..(Bs,Ss)) 

// if the prefix is too long, can't possibly 

natch the )c«y // 
if ntp>lc then return (FALSE, O) 
// match the prefix to the leading characters 
^° in the key // 

for to p do 
n=n+l 

55 if K^oPi then return (FALSE, 0) 

end 

// determine if this node contains the key // 
^ if n^k then (if D=null then return (FALSE, 0) 

else return (THUE,D)) 
// determine which node to process next. 

search branch characters // 
n=n+l 

loop 

60 
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case 
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end 
forever 

X=Sj 
forever 
end PSBAKCU 



; j >g 5 return ( FALSE , 0 ) 
: Kj^<Bj : return (FALSE, 0) 

relse: j=j+l 



D. Construcaon of a Tree and Jnaertion of Nodsa (FlgSi 4A. b and C) 

Thfl construction of a Tree 12 Is perfonmed In and by the same manner and method is used to insert 
new nodes into an existing tree, except that the Initial node of a new tree is inserted into an oEhemvis© 
ompty tree. For this reason, the following discussiort will describe the insertion of nodes Into an existing 
tree, with the understanding that the description applies equally to the construction of new trees. 

There are five general conditions requiring the Insertion of a new node Into a Tree 12: 

(a) A mismatch occurs between a prefiji and a new key before the end of either character string, a 
condition refen-ed to as a ''prefix collision"; 

(b) A new key Is longer than the prefix In question and the key matches for the entire length of the prefix 
but there are either no branch characters or the next character in the key after the last character of the 

prefix is not among the branch characters* a condition referred to as a "branch collision"; 

(c) A new key Is shorter than the prefix in question, and the prefix and the key match for the entire length 
□f the key, a condition refen-ed to as an "Initial substring"; 

(d) The length of a new key is equal to that of the prefix In question, and the key and the prefix match, 
but there ie no date easooiated with the prefix, a condition refen^ed to as a "data collision"; and, 

(e) The tree Is empty. 

Considering first the instance of d prefix collision, a prefix colli$ion requires the creation of three nodes 
to replace the node where the collision occurred; one to replace the previously existing node and two nodes 
dependent from that node. Of the two new dependent nodes, one will contain the portion of the key 
occurring beyond the character wh'ch caused the match to fail and the other will contain the portion of the 
prefix occuning beyond the character which caused the match to fail. The third node, which Is the 
replscement for the original node, will contain the portion of the orfglnal prefix which matched with the key 
and will include two branches and, correspondingly two BF 38s. One BF 38'8 Bj will be the character of the 
prefix which caused the match to fail and the assodated 8) will point to the new subnode contslning the 
remaining portion of the original prefix. The other BF aS's B| will be the character of the key which caused 
the match to fall, and the associated S| vvfll point to the new subnode containing the remaining portion of the 
key. 

This operation is Illustrated in Rg, 4A. wherein the new key "HashTable** Is to be added to a tree at a 
node T 4eA containing the prefix "HashFunction". The initial character swings "Hash" of the original prefix 
and the new key match, but the match fails at the "F" of the original prefix and the "T" of the new key, A 
first new subnode T 48B Is created whose PF 34 contains the portion of the original prefix occurring after 
the prefix failure characterr that is, the string "unction" which follows the prefix failure character "P". 
Original node T 4aA had a D Fiefd 36 pointer to a data record, so that new first subnode T 48B also has a 
D Reid 36 pointer to that same data record, If node T 48A had contained a Field BF 38, this would then 
appear in the new subnode T 48B. 

The second new subnode T 48C contains in its PF 34 the portion of the key occurring after the key 
failure character, thet i$, the string "able" which foliows the key failure character "T**. Second new eubnode 
T 48C will also contain a D Field 36 pointer to the data record associated with the key "HashTable". 

Finally, new node T 4fiD which replaces original node T 4SA has the string "HaBh** In its PF 34, that is, 
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the portion of the prefix and key strings which matched, A firat BF 38 of new node T 4BD contains a B, of 
is. the prefix character which failed in the match. ar,d an aasodated ^I^^XVTJTJT^S^ 
having the prefix "unction-, the remaining portion of the original prefix. A aeoond BF °f "^1^^^.°^^/^^^° 
contains a B, of "T", that is, the key character which failed in the match, and an associated Sj pointer to ttie 

6 new subnode having the prefix "able", the remaining portion of the key. Although ongmal node l'^^^ 
D Field 36 pointer to a data record, this pointer now appears m firat subnode T 4BB. so tnat ma 
replacement for orlfllnal node T48A has no D Field 36 pointer. .... . , 

Next considering the case of a branch collision, a branch collision requires the creation of two nodes to 
replace the original node where the collision occurred. One node will be a subnode which will contain in iK 

JO PF 34 the portion of the key occurring beyond the character which was not found among the branch 
characters Bi of the original r«de. The other new node will contain the prefix, branch characters and 
subtrees of the original node (n which the branch collision occurred, with the addition of a branch charader 
B, the new branch character being the key character which was not found as a branch character In the 
oriflinal node. Associated with this new branch character will be an S, pointer to the new subno*. 

16 This operation Is illustrated in Fig. 4B. wherein the new key "HaehedFile" is to be added to he free 
resulting from the operaUon illustrated In Fig. 4A. The new key 'HashedFile; is longer than prefix Hash of 
node T 480 end matches the entire prefix, The next character of the key, "e". however, Is not found m the 
BF 3Bs of T 48D. Accordingly, a new node T 48E is created containing, as a prefix in its PF 34. the key 
character string "dFile". which is the portion of the key after unfound branch character "e". A correepondii^ 

«j new BF 38 is created for T 48D with branch character "e" and an associated Sj pointer to new node T 4«E. 
It should be notea that new node T 4eE contains a D Field 36 pointer to the data record associated with key 
"HashedFile" and that nodes T 488 and T 48C remain unchanged. 

Considering the instance of an initial substring, when an initial substring is encountered two nodes are 
created to replace the node whera the collision was detected. The first node wiU contain, in its PF M. Je 

iS portion of the prefix which was not matched by the key, minus its initial ctisraeter, together wKh the 
subtrees and branch characters of the original node. The other node will contain, In Its PF 34, the portion of 
the prefix which was matched by the key. with the Initial character of the unmatched portion of the key as 
Its sole branch character and an associated Sj poirrter to the first node, which wni be a subnode of this 
Second node. 

ao This Qperatlon is illustrated In Fig. 4C. wherein the key "Binary'^ is to be added to node T 48F which 
has prefix "BirtarySearch" and a D Field 38 pointer to a data record. The "Binary" characters stnngs of 
both the key and the prefix match, while the "Search" portion of tho preRx is not matched by the key. 
Accordingly, a new node T 48F Is created having the string "oarch" as its prefix, that is, the portion of the 
original prefix which was not matched by the key, nnlnus its Initial character. "S\ T 48F alao has a D Field 

35 36 pointer to the data record criglnally associated wifli orlgfnal node. If T 48F had had branch characters 
and branch pointers to other nodes of the tree, these branch characters and pointers would be replicated In 
tha naw node T 48G. Th© second new node t 4BH is created with a prefix of -Binary", that la, the portion 
of the original prefix whksh was rrtatched by the key, and a singia branch character "S^ which is th© initial 
character of tha portion of the oriflinal prefix which was not matched by the key. Associated with branch 

40 character "S" will be a pointer to the new nod© T 48G and T 48H will contain a D Field 38 pointer to any 
data record associated with *o key "Binary". 

Rnally, thwe are the cases of a data collision and an empty tree. As described, in a data collision the 
length of a new key is equal to the length of the prefix and ^e key and prefix match but there is no data 
associated with the prefix. Data collisions are handled simply by adding the data to the node and rewriting 

45 ^e node with a D Field 36 pointer to the data record, 

The instance of an empty tres is similarly straightforward. The system creates an initial node by 
selecting a suitable root node prefix for the tree, for example, by selecting a set of keys providing the 
longest common prefix, and proceeds to add further nodes according to the methods described above. 
Further description of the above node insertion methods will be found In the following exemplary Insert 

50 Program Listing B; 
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yT^QGRAM, LISTING B - WQDE INSERU 

procedure PINSBRT(T, (K^^. .Kj^) rd) 

// Insert the Jcey value (Kj^..Kj^) into the 
prefisc search tree T, with data record pointer 
d. False is returned if d is null or if the 
Jcey value already exists. Otherwise, true is 
returned // 

if (d-null) then return(PALSE) // special case; d is 

IS null // 

if(T=nuH) // special case: tree 

is empty // 

then (T==M&KEN0DE((K2^.*Kj^)/d,()); return 
(TRUE) ) 

X=T ; Y«null ? Y^O i n=0 ; j =0 
loop 

input node X from disk 
let X be defined by (P^p.Pp),D, 

((Bi,Si)..{B3,Sa)) 
// match the prefix to the leading 

characters in the key // 
l=MIN(p,k-n) 
for i^l to 1 do 
n-n+l 

if i^oP^ then return (PREFIX (d, 
n,(Ki..K^),i,X,y,Y)) 

end 

// is the new key a subset of an existing 

key? // 
if n=*k then { 

if l=p then I 
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if Donull then return(PALSE) 
// trivial case; replace null 

pointer with d // 
D»d; output X to dlalcj return 

(TRUE) ) 

return (SUBSTRING (d,n, (K^. .K^> , 

l+l,X,y,Y)) ) 
// determine which node to process next, 
search branch characters // 

y«j ; j"! 
loop 

case 

: j>ssreturn(BRANCH(d,n, (K^. . 

: Kj,<B j t return ( BRANCH (d , n , 

(Ki..K,j),j,x,y,Y)) 
:K„«Bj:Bxit 
:elsBij'j+l 

end 
forever 

forever 
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TwpRPT FOR PBRFIX COLLISIQH 

5 procedure PREFIX (d, n, (K^. .Kj^) ,i/X,y,Y) 

// a collision has occurred within the prefix 

portion of a node. Three new nodes will bo 

f owned, JS, V, and W, replacing the node in 
" which thd conflict occurred, X. and 

were the conflicting characters. 

y is the subtree in the parent node of X, 
" which points to X // 

// assume 5£,Y are already in roemory 
20 let X define p,s, (Pj. .Pp) ,D, ( (Bj,Si) 

..(Bg.Sa)) 
let Y define Yp, Ys, (YPj^. .YPp) , YD, 
((YBi,YSi),.{YBs,YSs)) // 

// create new node U to hold remainder of new key 

and its data // 
U+MAKBNODE ( (Kn+j, . . Kjj) , (d) , ( ) ) 

// create new node V to hold reaainder of prefix 
and subtrees // 
36 V=M&KENODE£ (Pi+i, .Pp) , (0) , ((B^.S^) 

..(Bs'Ss))) 

// create new node W to hold coimnan prefix and new 
^ subtrees // 

if Kn<Pi 

then WsMMCENODE ( { Pi . . Pi_i ) , ( ) , ( (K„ , 0) , 
(Pi.V))) 

else W«MAKBNODE((Pi..Pi-i)*{), ((Pi/V), 



ss 
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// replace pointer to X in Y with pointer to tf, 
then destroy X // 
8 if V=null 

then T-w 

else (YSy'^W; output Y to disk) 
KILLNODE(X) ; retum(TBOE) 
end PRSFIX 

IS 



go 



as 



30 



35 



40 



4$ 



SO 



16 



APR, 10.2006 9:45AM NO. 4742 P. 32 

EP0 419 88dA2 

JWSERT FOR BRA ^l^H CQLLISIOW 

g procedure BRAKCH(d,n, .K^) / j ,X,y,:i) 

// a collision has occurred within the branch 
portion of a node. Two new nodes will be 
formed, 0 and replacing the node in which 
the conflict occurred, X. S ^'^^ 
character not found in 
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j provides the insertion point. y is the 
subtree in Y, the parent node of X, which 
points to X // 

// assune X,Y are already in memory 

let X define p,a, (Pi- .Pp) ,D/ ( (Bj^jS^) 

..(Bs'Sfi)) 

let Y define Yp^Ys, (YPj, .YPp) , YD, ( (YBi, 
YSi)..(YBg,Y8g)) // 



// create new node U to hold remainder of new key 

and its data // 
a-MAKSNO0B( (Kn+i. , (d) , () ) 
is // create new node W to hold remainder of prefix 

and subtrees // 
W-MAKEHODB((Pi. .Pp) , (D) , ((Bi,Si) . . (Bj-i^ 

// replace pointer to X in Y with pointer to W, 

then destroy X // 
if Yanull 
^ then T=w 

else (YSy=Wf output Y to disk) 
KILLNODB(X) ;return(rRUE) 
^ end BRANCH 
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JJ^SERT FOR INITIAL. .aUBSTRItLQ 

procedure SUBSTRING (d^n, (K^.-Kj^) ,i,X,y,Y) 
^ //an underflow has occurred within the pre- 

fix portion of a node. Two new nodes will 
be forined, V and W, replacing the node in 
'0 which the key was exhausted, X. 

WOULD be the next character exanined. y 
is the subtree in the patent node of X, 
18 which points to X // 

// assume X^Y are already in memory 
let X define p,a, (P, . .P„) ,0, 
((Bj^,Si)..(83,Sg)) 

let Y define Yp,Ys, (YP^, .YPp) ,YD, 
{(yBi,ya3^)..(YBs,YSg)) // 

ss 

// create new node V to hold remainder of 
prefix and subtrees // 
so V-MAKENO0E( (Pi+j. .Pp) , (D) , ((81,83^) . . 

(Bs'Sg))) 

// create new node W to hold common prefix 
35 and new subtree // 

W-M&KEMODB( (Pi. . P^-i) ^ (d) , ( (Pj^, V) ) ) 
// replace pointer to X in Y with pointer to 
then destroy X // 

« ... 

if Yssnull 

then T^W 

else {YSy=Wf output Y to disk) 
^ KILLNODB(X) /return (TRUE) 

end SUBSET 

50 end PINSSRT 
55 0. Deletion e[ Nottea 

Thd first step in deleting a node containing a given i^ey which is to be daleted is to locate the iwdff, 
which requires matching the key to the prefix completely, and determining vyhethsf there I9 data assodated 
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with the node. Thereafter, the deletion of the node depends upon th© number of branch characters, that is^ ; 
tho number of branches, dependent from the node. | 

In a first instance, there are no branch characters Bj in the node. That is, the node is a "leaf" node and , 
there are no other keys In the search tree formed by this node and its subtrees. In this case, the node ! 
6 having th© prefix which completely matches the key to be deleted is deleted and the subtree pointer and i 
associated branch character which point to this node are removed from the parent node, that is* from the j 
node containing the pointer to the node being deleted. ; 

In the next case there is exactly one branch character In the node to be deleted. That is, th© prefix 
matching the key occurs as th© leading characters of at least one other key held In th© search tree formed i 
10 by the node to be deleted end its subtrees. The nod© to bs deleted effectively operates as a placeholder for ! 
the key and all other branch points for other keys held in the tree formed of that node and its subtrees | 
appear in the nodes of the subtrees dependent from that node. | 

This key Is deletecf by first deleting the data record associated with the node containing the matching i 
prefix, that (s. the data record pointed to by the D Field 36 pointer of that node. In the next step, however. 
IB the connection or branch connecting the single child node of th© node to be deleted with the remainder of | 
the tree must be preserved. This is accomplished by coalescing the prefix and branch character of the i 
node to be deleted with the prefix of the child node, thereby creating a new node to replace both the node ; 
being deleted and tha single child nod© dependent from that node. This new node, In effect replaces the | 
node that was deleted, and la pointed to by the branch pointer of the deleted nodes parent node that 
20 originally pointed to the deleted node. 

This deletion of a node having a single branch Is illustrated In Rg. 5, wherein th© left hand drawing 
represents the original tree, and the right hand drawing the tree after th© deletion of a node. As illustrated. ^ 
the tr^B includes a root node T 4gA with two branches and thus two branch characters, "B*» and «H". with | 
their associated pointers. Th© *'B" branch pointer Sb goes to a branch which Is not Involved in the deletion j 
operation, and which will not b© discussed further. The branch dependent from the "H* branch character | 
and pointed to by associated pointer Sh contains the keys "Hash'', "HashTable". "HashTableRle" and | 
"HashTableUst". Nod© T48B contains th© key "Hash", through branch character "H" in node T 4SA and 
prefix "ash" In its PF 34, and ha© a slngl© branch, dependent from branch character "T" through associated 
branch pointer St, and a data record rofsrence through a D Field 36 pointer. Node 49B and key ''Hash*' are 
89 to be deleted from the treo in this example. 

Node 49B'8 branch pointer Sj is to a node T 49C, which contains the prefix "able" and two branch 
characters, "L" and "F", with associated branch pointers Sf and 8c to nodes T 45D and T 49E r©sp©ctiv©iy. 
Nodes T 400 and T 4QE respectively contain prefixes "Ist" and "lie" end D Reld 36 pointers to data 
records. 

55 In the deletion of nod© T 49B, th© data record pointed to by T 49B'8 D Field 36 Is located and deleted 
in the first step. Thereafter, T 49B and T 48C must be coalesced so as to preserve the keys and data 
record references of nodes T 49C, T 49D and T 49E, which are children of T 49B, and to maintain the links 
between the parent of T 49 B, that is, T 49A, and T 49C. T 49D and T 49E. As illustrated In th© right hand 
portion ctf Fig. 5, a new node T 49F containing th© prefix ^ashTable" is created, wherein this prefix Is the ; 

40 coalition of prefixes "ash" from node T 49B and "Table" fitim node T 4QC. Node T 49F has two branch | 
characters. "L" and "F" from nod© T 49C, and associated branch pointers Sl and Sf to, respectively, nodes 
T 49D and T 49E. Th© branch pointer 8h of T 49A pointing to the original, deleted nod© T 408 now points ! 
to new node T 49F. so that tho links from nod© T 49A through to nodes T 49D and T 49E are presen/ed, j 
In a final cas© of d©l©tion of a node^ the node to be deleted will have more than on© branch character to ; 

45 child nodes, that Is. th© prefix of that node to be deleted will occur as the leading chanactors of at least two 
other keys held in the search tree formed from that nod© and it subtrees. In this instance, only the data Is 
deleted from the node, by deleting th© nodes D Field 38 pointer to the data record associated with the key 
to be deleted. It is necessary to retain the prefix and branch characters of the node as this node forms the | 
branch point between the two or more keys heW in the subtrees of th© nod©« 

80 Further description of th© above node deletion operations will be found in the following exemplary 
□©lets Program Listing C: 
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ppoflPAM LISTING C - NOPE DELETION 

procedure PD£liETE(T, (K^- .Kj^) ) 
® // remove the key value (Kj^-,K,^) from the 

prefix deerch tree T- 
A tuple (i/d) is returned; i is false if K does 
'0 not exist. 

Otherwise i is true and d is the data record 
pointer // 

,5 if T=nuU then return (FALSE /null) 

X=T ; Y=nul 1 ; y=0 ; Z =null ; z 0 ; j =o ; n- 0 
loop 

input node X from disk 

let X be defined by p,s, (Pj^, .Pp) ,D, 

((Bi,Sj)..(Bs,Sg)) 
// natch tba prefix to the leading characters 

in the Key // 
If k;-n<p then return (FALSE, null) 
for i"l to p do 
so n«n-H 

if Kj^oPi then re turn (FALSE, null) 

end 

// does the key match the prefix? // 



18 



.APR. 10. 2006 9:46AM 



NO. 4742 P. 36 



EP 0 419 889 A2 



if n-k then { 

if D-null than return (FALSE^ null) 

5 

case 

:s-=0:call LEAF(X,y, V,Z;Z) 
:s-l;can JOIN(X,y/Y} 

end 

return (TRUE, d) } 
IB // deterwine which node to process neict. 

Search branch characters // 
n»n+l 

loop 

case 

: j >S J return (FALSE, null ) 
: Kj,<B j : return ( FALSE , null ) 
!Ky,«=Bj J exit 
:else; jaj+l 

no end 
forever 

« forever 
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procedure IiEAI'(X,y,Y,z,Z) 

// The key has ended in a leaf node. We will de- 
late this node, X, and the branch character, 
subtree pointer tuple, {By,Sy), in the par- 
ent node, V, which led us here. // 

// assume X, Y, and Z are already in memory 
let Y define p,s, (P^. .Pp) /D, ( (BiiS^) . . 

let Z define Zp,3s, (ZPj^. .ZPgp) /ZD/ ( (ZB^, 
ZSi),,(ZB2g,ZS23)) // 
^ // destroy node X // 

KILLNODB(X) ; 

// create new node W to hold contents of Y, minus 
one subtree // 
" if Y=null then {T!=null;returnJ 

tf^HAKE»OOB((P]^. .Pp) , (D) , ( (B^/S^^) . . (By^i, 

Sy-l) , (By+l,Sy+l) . . (Bg/Sg) ) ) 

90 // destroy node Y // 

KILLNODE(Y) 

// replace pointer to Y In 2 with pointer to W // 
as if 3-null then (T«Wf return) 

zS2=ff; output z to disk 
return 
end LEAF 
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JOIN 



procedure joiN(X,/|Y) 
^ // The key has ended in a node with one subtree. 

We will create a new node to replace both this 
node, X, and the root node of the subtree 

// assume X, Y, and Z are already In memory 
let V define Vp,Vs, (VP^. .VPyp) ,VD, 

let X define p,a, (Pi- .Pp) ,D, ( (Bi,S^) 

let Y define Yp,Ys, (YP^. .YPyp) , VD, 
" ({YBi,YSi)..(YBy3,YSyg)) // 

// read neict node, from s\jbtree, into memory // 

V»Sj^? input node V from disk 
'6 // create new node W to hold contents of X 

plus V, minus one subtree // 

W-MAKEN0DE((Pj^..Pp,8i,VPi.. VPyp) , (VO) , 
30 ((VBi,VSi)..(VBva,VSva))) 

// destroy node V,X // 

KILLHODE(X) ; KILLNODE ( V) 

// replaee pointer to X in Y with pointer to W // 

36 

if y^null then (T-W; return) 
VSy«W; output Y to disk 
return 
^ end JOIN 

dnd PDELST£ 

While the invention has been particularly shown and described with reference to a preferred ©mbodi* 
^ ment Of the method and apparatus thereof. It will be understood by those of ordinary skill In the art that 
various changes In form. deteil$ znd implementation may be made therein without departing f^om the spirit 
and scope of the invention ee defined by the appended claims. 



Claims 

1, A prefix Index tree structure for locating data records stored in a database in a data processing system 
through keys related to the Information stored In the data reconds, each node of the tree comprising: 
a prefix field for storing a prefix string of length p comprised of the longest String of k&^f characters shared 
by all subtrees of the node; 

a date record field for storing a reference to a data record whose key Is completed by the prefix string; and. 
when the prefix string is a prefix of I<ey8 stored In at least one subtree of the node, a branch field for eadi 
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distinct p+ 1*' key character In the keys of the subtrees, wherein each distinct p*1" key character is a 

branch character and each branch field comprises 

a branch character field for storing the p + 1" character of a key. and 

a branch poir^r fieW for staring a reference to a nods of a subtree containing at least one key vrhose 
s p+l" character Is the branch character. 

2 The node of the prefbj index tree structure of claim i. wherein each node further comprises, 
a'field for storing a number equal to the number of key characters in the prefix string, and 

a field for storing a number equal to the number of branch fields In tfw node. 

3 A method lor constructing a prefix Index tree structure for locating data records stored in a database in a 
data pwcossing system through keys related to the information stored in the data records, compnsmg, for 

each node of the tree, the steps of: . . ^ u .. 

determining a prefix string of length p that Is the tongest string of key characters shared by all subtrees of 

the node. 

Storing the prefix string In a prefix field of the node; 

when there is a data record whose key ie completed by the prefix string, ^ c ^ .u 

storing a reference to a data record whose key Is completed by the prefix string m a data record field of the 

node: and, . 
when the prellx string is a prefix of keys stored in at least one subtree of the node. ^ ^ ^ ^ 
ddtermlnins the branch characters for all of the keys stored In each subtree, wherein each branch character 
20 is a distincl p + 1 ^ character of a key contained in a subtree of the node, and 
creating a branch field for each branch character, and 

storing the corresponding branch character in a branch character field of the branch field, and 
storing a reference to a node of a subtree containing at least one key whose 1** character is the branch 
character in a branch pointer field of the branch field. 
26 4, In a prefix index tree of claim 1, , , j »^ *w* 

a method for searching the prefix Index tree to locate a data record usmg search keys related to the 
information stored in the data records, comprising the steps of: 
comparing a search key of length k greater than p to the prefix string of a node. 
When there Is no match between the search key and the prefix string, 
90 tenDinating the search, 

when there Is a complete match between the search key and the preRx siring, 

reading the r^erence from data record field of the node to determine the location of the data record whose 
key corresponds to the search key, and, 

when the Initial p characters of the search key match the prefix string. ,^ ^ u w 

compare the p + I" character of the search key to the branch characters of the branch fields of the node. 

and , 
when there Is no match between the p + 1** of the search key and a branch character, 
terminating the search, and 

when there is a match between the p + 1*^ of the search key and a branch character, 
reading from the branch pcinter field of the branch fioW the reference to the subtree node containing a key 
whose p-i- 1*^ character metcheS the p + 1*^ character of the search key, and 
repeating the above steps with respect to the node referenced by the branch pointer field, 
5. In a prefix Index tree of claim 1 . . *k 

a method for inserting a new key Into a node of the prefix index tree when there is a mismatch between the 
key and the prefix string that occurs before the end of both the key and prefix string, comprising the steps 
of: 

creating a first new node containing, 

rn Its prefix field the portion of the key occurring after the key character that caused the match of key and 
original prefix string to teif, and 
so in Its data record field a reference to the data record associated with the new key, 
creating a second new node containing. 

In its prefix field the portion of the original prefix string occurring after the original prefix character that 
caused the match of key and original prefix string to fall, and 

in te data record and branch fields the contents of the data record and branch fields of the original node, 
55 and 

creating a third new node containing, 

in Its prefix field the portion of the original prefix string which matched with the key, 
a first branch field having 
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in its branch character field the key character which caused the match between the original prefix string and 

key to fa(l« and 

in its branch pointer fiftld a reference to the first new node, and 
a second branch field having 
5 In its branch character field the character of the original prefix string which caused the match between the 

orrginal prefix string and key to fail, and 

In Its branch pointer field a reference to the second new node. 

e. In a prefix index tree of claim 1, 

a method for Inserting a new key of length k into a node of the prefix index tree when the prefix string of 
10 the node Is of length p less than k and matches the rnilial p characters of the new key and the node 
contains no branch character matcWng tha p+ 1^' character of the key. comprising the steps of: 
creating a new node containing 

In Its prefix field the portion of the new key following the p + 1"* character of the new key, and 
in its data record field a reference to the data record associated with the new key, and 
16 in the original node, 

adding a new branch field cont^nlng 

In its branch character field the p + V* character of the new key, and 
In its branch pointer lield a reference to the new node. 
7. In a prefix index tree of claim 1, 
so a method for Inserting a new key of length k Into a node of the prefix Index tree when the prefix string of 
the node is of length p greater than k and the initial p characters of the new key match the prefbc string, 
comprising the steps of: 
creating a first new node containing 

in its prefix field the portion Of the original prefix string following the k+i" of the original prefix string, and 
2$ in ite data record and branch fields the contents of the data record and branch fields of the original node, 
and 

creating a second new node in replacement for the original node, containing 

In Its prefix fieW the portion of the original prefb< string that was matched by the search key, and a branch 
field, containing 

$0 In Its branch character field the k+ 1** of the original prefix string, and 
in its branch pointer held a reference to the first new node, 
a. In a prefix Index tree of claim 1 , 

a method for deleting a key from the tree, comprising the slaps of: 

detennining the node containing the key to be delated and the number of branch characters of the node, 
when there are no branch characters in the node» 
deleting the node, and 

delating the branch character and branch pointer to the deleted node from the branch field of the parent 
node of the deleted node. 

when the node contains more than one branch character, 
4Q deleting the data record pointer referencing the data record associated with the key to be deleted, and 
when the node contains one branch character, 

locate the child node referenced by the branch pointer of the single branch field of the node, 

create a new prefbc string for the node by coalescing the original prefix st^ng and the prefix string oF the 

child nocfe, 

46 delete the original single branch flekl firom the node, and 

write the branch fields and data record field from the chiki node into the branch fields and data record field 
of the node. 



so 
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